Self-Modeling Agents and Reward Generator Corruption

نویسنده

  • Bill Hibbard
چکیده

Hutter's universal artificial intelligence (AI) showed how to define future AI systems by mathematical equations. Here we adapt those equations to define a self-modeling framework, where AI systems learn models of their own calculations of future values. Hutter discussed the possibility that AI agents may maximize rewards by corrupting the source of rewards in the environment. Here we propose a way to avoid such corruption in the selfmodeling framework. This paper fits in the context of my book Ethical Artificial Intelligence. A draft of the book is available at: arxiv.org/abs/1411.1373. Self-Modeling Agents Russell and Norvig defined a framework for AI agents interacting with an environment (Russell and Norvig 2010). Hutter adapted Solomonoff's theory of sequence prediction to this framework to produce mathematical equations that define behaviors of future AI systems (Hutter 2005). Assume that an agent interacts with its environment in a discrete, finite series of time steps t ∈ {0, 1, 2, ..., T}. The agent sends an action at ∈ A to the environment and receives an observation ot ∈ O from the environment, where A and O are finite sets. We use h = (a1, o1, ..., at, ot) to denote an interaction history where the environment produces observation oi in response to action ai for 1 ≤ i ≤ t. Let H be the set of all finite histories so that h ∈ H, and define |h| = t as the length of the history h. An agent's predictions of its observations are uncertain so the agent's environment model takes the form of a probability distribution over interaction histories:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Issn 1045-6333 Corruption and Optimal Law Enforcement

We analyze corruption in law enforcement: the payment of bribes to enforcement agents, threats to frame innocent individuals in order to extort money from them, and the actual framing of innocent individuals. Bribery, extortion, and framing reduce deterrence and are thus worth discouraging. Optimal penalties for bribery and framing are maximal, but, surprisingly, extortion should not be sanctio...

متن کامل

Corruption and optimal law enforcement

We analyze corruption in law enforcement: the payment of bribes to enforcement agents, threats to frame innocent individuals in order to extort money from them, and the actual framing of innocent individuals. Bribery, extortion, and framing reduce deterrence and are thus worth discouraging. Optimal penalties for bribery and framing are maximal, but, surprisingly, extortion should not be sanctio...

متن کامل

Reinforcement Learning with a Corrupted Reward Channel

No real-world reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MD...

متن کامل

Reward Self-Reporting to Deter Corruption: An Experiment on Mitigating Collusive Bribery

This paper investigates the effectiveness of offering rewards for self reports as a means of combating collusive bribery. Rewarding self reporting theoretically sows distrust between parties tempted to exchange bribes and may reduce bribery even where authorities are otherwise ineffective in uncovering corruption. Our results indicate that offering rewards is weakly effective in reducing collus...

متن کامل

Modelling structural relations of craving based on sensitivity to reinforcement, distress tolerance and self-Compassion with the mediating role of self-efficacy for quitting

Background & Objectives:  Craving is a major barrier to the effective treatment of substance  addiction. This study conducted in order to Modelling structural relations of craving based on sensitivity to reinforcement, distress tolerance and self-compassion with the mediating role of self-efficacy for quitting. Materials and Methods: The method of this study was descriptive-correlational. The...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015